Integrated transcription and identification of named entities in broadcast speech
نویسندگان
چکیده
This paper presents an approach to integrating functions for both transcription and named entity (NE) identification into a large vocabulary continuous speech recognition system. It builds on NE tagged language modelling approach, which was recently applied for development of the statistical NE annotation system. We also present results for proper name identification experiment using the Hub-4 evaluation data.
منابع مشابه
Multimedia interaction for the new millennium
Spoken language processing has created value in multiple application areas such as document transcription, data base entry, and command and control. Recently scientists have been focusing on a new class of application that promises on-demand access to multimedia information such as radio and broadcast news. In separate research, augmenting traditional graphical interfaces with additional modali...
متن کاملReal-time rich-content transcription of Chinese broadcast news
This paper describes the recent development of an Audio Indexing System for Chinese (Mandarin) broadcast news. Key issues of the three major components: automatic speech recognition, speaker identification and named entity extraction are addressed. The Chinese-language-specific challenges are discussed and our solutions are described. The recognition accuracy of the final system is comparable t...
متن کاملOOV Sensitive Named-Entity Recognition in Speech
Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named e...
متن کاملRobust Named Entity Extraction from Large Spoken Archives
Traditional approaches to Information Extraction (IE) from speech input simply consist in applying text based methods to the output of an Automatic Speech Recognition (ASR) system. If it gives satisfaction with low Word Error Rate (WER) transcripts, we believe that a tighter integration of the IE and ASR modules can increase the IE performance in more difficult conditions. More specifically thi...
متن کاملNamed Entity Recognition on Transcribed Broadcast News Guidelines for Participants
In the Named Entity Recognition (NER) task, systems are required to recognize different types of Named Entities (NEs) in Italian texts. As in the previous editions of EVALITA, we distinguish four NE types: Person (PER), Organization (ORG), Location (LOC) and Geo-Political Entities (GPE). Participant systems should identify both the correct extension and type of each NE. The output of participan...
متن کامل